RL Configuration

RL Configuration Guide (with LiftObj as an example)

This page explains how reinforcement learning (RL) environments are defined and configured in LW-BenchHub, using the LiftObj task as a concrete example. It covers how robot and task RL configs are composed, where the configurations live, and how to train and evaluate with SKRL.

High-level composition

An RL environment is composed from:
- A robot RL variant (actions, frames, sensors) under lw_benchhub/core/robots/<robot>/
- A task RL environment config (observations, rewards, events, curriculum) under lw_benchhub_rl/<task>/
- A scene/layout (kitchen variants or USD)
Both robot and task RL configs are registered in lw_benchhub_rl/__init__.py using a helper, which also links the SKRL agent config (YAML).

Where things are (LiftObj)

Task RL env config (Python): lw_benchhub_rl/lift_obj/lift_obj.py
Robot RL variant (Python): lw_benchhub/core/robots/unitree/g1.py (class UnitreeG1HandEnvRLCfg)
RL environment registration: lw_benchhub_rl/__init__.py
Global RL presets (YAML): configs/rl/skrl/rl_base.yml
Task preset (YAML): configs/rl/skrl/g1_liftobj.yaml
SKRL agent config (YAML): lw_benchhub_rl/lift_obj/agents/skrl_ppo_cfg.yaml
Training/Eval scripts: lw_benchhub/scripts/rl/train.py, lw_benchhub/scripts/rl/play.py, shell wrappers train.sh, eval.sh

Robot RL variant (G1-RL)

The robot RL variant adapts the robot to RL control (e.g., action heads as joint targets, defining end-effector frames). For Unitree G1-Right-Hand:

File: lw_benchhub/core/robots/unitree/g1.py

Key points in UnitreeG1HandEnvRLCfg:

Sets robot_name = "G1-RL" for RL usage
Replaces teleop actions with RL action head (e.g., JointPositionActionCfg for the right arm joints)
Defines ee_frame (FrameTransformer) with targets for TCP and fingertips, used in observations/rewards
Sets reward-related joint name patterns for gripper and arm

Effect: The robot provides the correct action/observation interfaces for the RL task and rewards.

Task RL env config (LiftObj)

File: lw_benchhub_rl/lift_obj/lift_obj.py

The LiftObj RL config composes the base task with RL-specific managers.

Commands

@configclass
class CommandsCfg:
    object_pose = mdp.UniformPoseCommandCfg(
        asset_name="robot",
        body_name=MISSING,  # set in LiftObjG1RlCfg
        resampling_time_range=(5.0, 5.0),
        debug_vis=False,
        ranges=mdp.UniformPoseCommandCfg.Ranges(
            pos_x=(-0.05, 0.05), pos_y=(-0.20, -0.05), pos_z=(0.3, 0.35),
            roll=(0.0, 0.0), pitch=(0.0, 0.0), yaw=(0.0, 0.0)
        ),
    )

The task samples a target object pose command object_pose. In LiftObjG1RlCfg, this command's body_name is bound to the robot link, e.g. right_wrist_yaw_link.

Policy Observations

@configclass
class G1LiftObjStateObsCfg(RlBasePolicyObservationCfg):
    joint_pos = ObsTerm(func=mdp.joint_pos_rel)
    joint_vel = ObsTerm(func=mdp.joint_vel_rel)
    target_object_position = ObsTerm(
        func=mdp.generated_commands, params={"command_name": "object_pose"}
    )
    actions = ObsTerm(func=mdp.last_action)
    image_hand = ObsTerm(
        func=mdp.image_features,
        params={"sensor_cfg": SceneEntityCfg("hand_camera"), "data_type": "rgb", "model_name": "resnet18", "model_device": "cuda:0"},
    )
    rgb_d435 = ObsTerm(
        func=mdp.image_features,
        params={"sensor_cfg": SceneEntityCfg("d435_camera"), "data_type": "rgb", "model_name": "resnet18", "model_device": "cuda:0"},
    )
    def __post_init__(self):
        self.enable_corruption = True
        self.concatenate_terms = True

Notes:

Proprioceptive terms use relative joint pos/vel
generated_commands injects the sampled object_pose command
Two image feature terms are provided (hand camera and D435)
Observations are concatenated for the policy

Rewards

@configclass
class RewardsCfg:
    reaching_object = RewTerm(func=mdp.object_ee_distance, params={"std": 0.1}, weight=1.0)
    lifting_object = RewTerm(func=mdp.object_is_lifted, params={"minimal_height": 0.98}, weight=15.0)
    lifting_object_grasped = RewTerm(
        func=mdp.object_is_lifted_grasped,
        params={"grasp_threshold": 0.08, "velocity_threshold": 0.15},
        weight=8.0,
    )
    gripper_close_action_reward = RewTerm(
        func=mdp.gripper_close_action_reward,
        weight=1.0,
        params={"asset_cfg": SceneEntityCfg("robot", joint_names=["right_hand_.*"])},
    )
    object_goal_tracking = RewTerm(
        func=mdp.object_goal_distance,
        params={"std": 0.3, "minimal_height": 0.97, "command_name": "object_pose"},
        weight=16.0,
    )
    action_rate = RewTerm(func=mdp.action_rate_l2, weight=-1e-4)
    joint_vel = RewTerm(func=mdp.joint_vel_l2, weight=-1e-4, params={"asset_cfg": SceneEntityCfg("robot")})

Notes:

Dense shaping includes proximity (object_ee_distance) and goal tracking (object_goal_distance)
Success-related terms check lifting height and near-grasp conditions
Regularizers penalize fast actions and joint velocities

Events and Curriculum

@configclass
class EventCfg:
    reset_all = EventTerm(func=mdp.reset_scene_to_default, mode="reset")
    reset_object_position = EventTerm(
        func=mdp.reset_root_state_uniform, mode="reset",
        params={"pose_range": {"x": (-0.05, 0.05), "y": (-0.05, 0.05), "z": (0.0, 0.0), "yaw": (0.0, 90.0)},
                "velocity_range": {}, "asset_cfg": SceneEntityCfg("object", body_names="apple_11")},
    )

@configclass
class CurriculumCfg:
    action_rate = CurrTerm(func=mdp.modify_reward_weight, params={"term_name": "action_rate", "weight": -5e-3, "num_steps": 36000})
    joint_vel = CurrTerm(func=mdp.modify_reward_weight, params={"term_name": "joint_vel", "weight": -5e-3, "num_steps": 36000})

Compose the RL config

@rl_on(task=LiftObj)
@rl_on(embodiment=G1-RL)
class G1LiftObjStateRL(LwRL):
    def __init__(self):
        super().__init__()
        # define or reuse cfg here
        self.rewards_cfg = RewardsCfg()
        self.events_cfg = EventCfg()
        self.curriculum_cfg = CurriculumCfg()
        self.commands_cfg = CommandsCfg()
        self.policy_observation_cfg = G1LiftObjStateObsCfg()
        self.resample_objects_placement_on_reset = False
        self.resample_robot_placement_on_reset = False
    
    def setup_env_config(self, orchestrator):
        super().setup_env_config(orchestrator)
        # G1 right eef name
        orchestrator.task.commands_cfg.object_pose.body_name = "right_wrist_yaw_link"

Terminations

BaseRLEnvCfg includes a default time_out_ termination. Training scripts disable the time limit termination at runtime and handle episode rollouts explicitly.

Registration and presets

RL environment registration

File: lw_benchhub_rl/__init__.py

gym.register(
    id="Robocasa-Rl-RobotTaskRL",
    entry_point="isaaclab.envs:ManagerBasedRLEnv",
    kwargs={
        "env_cfg_entry_point": f"{__name__}.lift_obj.lift_obj:G1LiftObjStateRL",
        "skrl_cfg_entry_point": f"{lift_obj_agents.__name__}:skrl_ppo_cfg.yaml"
        "rsl_rl_cfg_entry_point": f"{lift_obj_agents.__name__}:rsl_rl_ppo_cfg:PPORunnerCfg"
    },
    disable_env_checker=True,
)

This registers the composed environment and links the SKRL agent YAML.

Global preset: rl_base.yml

File: configs/rl/skrl/rl_base.yml

Important fields:

disable_fabric (if false ⇒ env uses fabric; faster sim I/O)
num_envs, layout, usd_simplify, robot_scale, first_person_view
rl: Specifies the RL config name binding for the robot-task combination.
Logging/video controls: video, video_length, video_interval
Training controls: seed, distributed, max_iterations
ML framework and algorithm selection: ml_framework, algorithm
Checkpoint: checkpoint, use_pretrained_checkpoint, real_time
enable_cameras: enable cameras in the env (required by Visual variants)
variant: RL variant name (e.g., State or Visual), used when registering RL configs

Task preset: g1_liftobj.yaml

File: configs/rl/skrl/g1_liftobj.yaml

Overrides and bindings:

_base_: rl_base
task: LiftObj
robot: G1-RL
layout: robocasakitchen-1-8
rl: G1LiftObjStateRL
num_envs: 200
usd_simplify: true
variant: State (default to State-based LiftObj)
enable_cameras: false (set true for Visual variant)

SKRL agent configuration (LiftObj)

File: lw_benchhub_rl/lift_obj/agents/skrl_ppo_cfg.yaml

Key sections:

models: network tops for policy/value. GaussianMixin is used for continuous actions.
memory: rollout buffer backend (RandomMemory) with memory_size = -1 to follow agent rollouts.
agent (PPO): core hyperparameters.
- rollouts, learning_epochs, mini_batches
- discount_factor (gamma), lambda (GAE lambda)
- learning_rate and scheduler (KLAdaptiveLR with kl_threshold)
- ratio_clip, value_clip, entropy_loss_scale, value_loss_scale
- rewards_shaper_scale: multiplies rewards before updates
- experiment: logging/checkpoint frequency and directory
trainer: training driver and timesteps for total steps

Practical tips:

Effective batch = num_envs * rollouts (per update); increase with GPU memory.
Use KL-based LR scheduler to stabilize large batch PPO.
Increase entropy_loss_scale to increase exploration; tune down later.

Training and evaluation

Default run via scripts:

bash train.sh   # uses configs/rl/skrl/g1_liftobj_state.yaml by default
bash eval.sh    # evaluate the latest checkpoint

Or directly via Python:

python ./lw_benchhub/scripts/rl/train.py --task_config g1_liftobj_state --headless 
python ./lw_benchhub/scripts/rl/play.py  --task_config g1_liftobj_state

Troubleshooting

OOM / low FPS: reduce num_envs, disable cameras, set usd_simplify: true.
Instability: lower learning_rate, increase learning_epochs, reduce mini_batches.
Poor exploration: increase entropy_loss_scale or relax reward shaping initially.
Reward plateaus: verify object_pose command ranges, minimal_height thresholds, and camera feature devices.